Boolean Compilation of Relational Specifications 


Daniel J ackson 
MIT Lab for Computer Science 
January 1998 


MIT-LCS-TR-735 


Abstract 


A new method for analyzing relational specifications is described. A prop- 
erty to be checked is cast as a relational formula, which, if the property 
holds, has no finite models. T he relational formula is translated into a boo- 
lean formula that has a model for every modded of the relational formula 
within some finite scope. Errors in specifications can usually be demon- 
strated with small counterexamples, so a small scope often suffices. The 
boolean formula is solved by an off-the-shelf satisfier. 


T he satisfier requires that the boolean formula bein conjunctive normal 
form (CN F). A naive translation to CN F fails (by exhausting memory) for 
realistic specifications. T his paper presents a preliminary design of a com- 
piler that overcomes this problem, by exploiting typical features of the rela- 
tional formulae that arise in practice. Initial experiments suggest that this 
method scales more readily than existing approaches and will be able to find 
more errors, in larger specifications. 


Keywords: Software design analysis, formal specification, object mode, Z, 
relational calculus, model finding, boolean satisfaction, WalkSAT, N itpick. 


1 Introduction 


T wo views are central in software design. The event view is concerned with 
which events may occur during execution of the system, and their ordering 
patterns. T he object configuration view is concerned with which objects exist 
and their relationships to one another. Even though these views are not 
disjoint - as events occur, configurations change - it is often useful to ex- 
amine them independently. 


M ost formal specification languages have been developed with primarily 
one of these views in mind: the event view for CSP, Statecharts and L otos; 
the object configuration view for Z, VDM and Larch. O bject oriented 
methods, such asO MT, Fusion and Syntropy, typically advocate the con- 
struction of a model for each view: the “dynamic model” for the event view 
(typically expressed as a Statechart) and the “static mode” for the object 
configuration view (typically expressed as an entity-relationship diagram). 


T he analysis of event views is well supported by tools: in addition to simu- 
lators, there are now model checkers that can exhaustively analyze huge state 
spaces. O bject configuration views, in contrast, are poorly supported. Ex- 
isting automated tools offer only shallow syntactic checks of the description, 
and rarely expose problems in the software design itself. 


This disparity is particularly renarkable given the centrality of object con- 
figurations. The event view is sometimes dominant (for example, in the de 
sign of protocols and process control systems), but usually plays a subsidiary 
role. Take a glance at an influential book on design patterns [G +95], and 
note the proportion of diagrams that describe object configurations. 


N itpick is an attempt to redress this imbalance. Given a constraint describ- 
ing a set of object configurations, Nitpick can generate instances satisfying 
the constraint. It can check that declared constraints have intended conse 
quences, by searching for instances that satisfy one constraint but not an- 
other. N itpick also accepts descriptions of operations that describe transi- 
tions from one configuration to another, and can simulate these, and check 
a variety of ther properties, such as whether an operation preserves a con- 
straint. 


The current implementation of N itpick works by explicit enumeration. 

U sing a variety of mechanisms to prune the search tree, it can analyze 
enormous configuration spaces. T he tool has been used successfully to find 
flaws in anumber of small designs, including a published draft of a mobile 
internet routing protocol [N 997, JN W497]. Even small designs, however, 
tend to have huge configuration spaces. | ndeed, this is perhaps why analysis 
of object configurations has not attracted the same attention as the analysis 
of state machines. A relation over a domain of 3 elements has 512 possible 
values, adding a single such relation to a specification thus increases the 
space by 3 orders of magnitude. 


This paper describes a new method for analyzing descriptions of object 
configurations. Section 2 presents a toy example that conveys the flavour of 
the specification notation; Section 3 compares the notation to popular no- 
tations. Section 4 demonstrates a N itpick analysis of the example. 


Section 5 explains how simulation and checking can both be reduced to 
mode’ finding. Section 6 articulates the hypothesis that most errors can be 
illustrated with small counterexamples. 


Section 7 gives the intutions behind the new method. Sections 7 to 16 are 
the core of the paper; they present the structure of the compiler: the se 
quence of representations and their respective transformations. 


Section 17 gives some performance results that compare the new method to 
our previous explicit method. Section 18 and 19 discuss related work and 
future plans. 


2 Relational Notation: NP 


Our notation, NP, is roughly a subset of Z [Spi92]. A reference manual 
[) D 96b] describes the notation in detail. H ere, we explain its basic elements 
with atoy example. 


[Ph, Num] 


Switch = [ 
called : Ph <-> Num 
const net : Num -> Ph 
conns : Ph <-> Ph 
| 


conns = called ; net ] 


Join (p : Ph; n: Num) = [ 
Switch 
| 
p in dom called 
n notin ran called 
called’ = called U {p -> n}] 


JoinOK (p : Ph; n: Num) ::[ 
Switch | Join (p, n) and inj conns => inj conns’ ] 


Figurel: NP specification of ESS 


The Extremdy Simple Switch (ESS) connects telephones by maintaining 
two relationships: called, which maps phones to numbers, and represents 
ongoing phone calls, and net, a constant relationship that maps numbers to 
the phones they name. When (p1, n) belongs to called, phone pl has an 
active connection to the number n; when (n, p2) belongs to n4, p2 is the 
phone specified by the number n. The composition of called and net isa 
relation conns that associates phones that are connected; it would include, in 
this case, the pair (p1, p2). 


Part of a(faulty) specification for ESS is shown in Figure 1. The first line 
declares Ph and N um to be primitive types. T he schema Switch that follows 
declares the three state components with an accompanying constraint that 
serves to define conns - redundant and introduced only for the convenience 
of later expressing certain properties - in terms of the other two. 


The form of the arrow in a declaration indicates the kind of relation. T hus 
called is an arbitrary relation, allowing the modelling of conference calls in 
which a phone has called more than one number at a time, but ne isa 
function. 


Switch defines a set of configurations. Entity relationship diagrams can be 
used for the same purpose, but are not as expressive. The diagram of Figure 
2 corresponds to the declarations of Switch, but does not express the con- 
straint amongst the three relations. 


Figure2: ER diagram corresponding to Switch schema of Figure 1 


The schema] oin specifies an operation in which a new party is added to a 
call. The parameter p is the phone at which the join is performed; it is con- 
strained to be already making a call. The parameter n is the new number 
being called, and is constrained not to be already called. The effect of the 
operation is given by the last constraint. H ere, the primed instance of the 
variable called denotes the value of the state component after the operation; 
it thus asserts that the operation adds the pair (p, n) to the called relation. 
Thene relation, having been declared in the imported schema Switch to be 
constant, does not change; it is as if the assertion 


née’ =ne 


were included. Beyond this, there are no implicit frame conditions. In fact, 
the change in called will generally require a corresponding change in conns. 


The third schema, JoinOK, isaclaim. It isnot part of the specification 
proper. Rather, it asserts a property of the specification: that if the relation 
connsis injective prior to execution of ajoin, it will be injective afterwards 
too. If connsis not injective, there is a call involving two phones acting as 
callers; for billing reasons, we prefer to avoid this situation, and construct all 
conference calls with a single caller. 


3 Why Relational Specification? 


This style of specification was pioneered by the developers of the Z specifi- 
cation language. For readers familiar with other notations, we shall explain 
briefly what features of the relational notation make it, in our opinion, a 
more suitable notation for describing object configuration views. 


The principal difference between our language, N P, and Z is that N P is 
first-order: its data structures are simple relations, and it does not admit, for 
example, relations between relations. While Z is based on set theory, NP is 
based on the relational calculus. Z’s extra power makes it less tractable; N P 
gains much in tractability for a relatively small reduction in expressiveness. 


The formal specification languages VD M and Larch have much in common 
with Z. Their data structures, however, are not based on set theory but 
rather on acollection of data types each with its own axiomatization. Ina 
sense, N P is less abstract, since all data structures are encoded as graphs. But 
for describing configurations, graphs are so natural that bias is rare, and 
more abstract types are not needed. 


The input languages of model checkers, such as SM V’s assignment lan- 
guage, M urphi’s U nity-like language and SPIN ’s Promela tend to have 
weak support for data structures, and do not provide primitives for com- 
posing relations and applying functions. T hey also tend not to support im- 
plicit specification fully, in which the result of an operation is defined by an 
assertion relating the variables of pre- and post-states, rather than by an ex- 
plicit assignment. 


O bject-oriented methods, such as UML, OMT, Fusion and Syntropy, pro- 
vide graphical notations for describing object modds. T hese are extremely 
weak, and allow only the most elementary constraints between relations 
(such as that one is a subset of another) to be expressed. V arious textual an- 
notation languages have been designed, most recently UM L’s O bject Con- 
straint Language (O CL) [I1BM 97], which originated in the constraints of 
Syntropy. OCL is very similar to N P, although it does not have a formal 
semantics and does not appear to be able to express transitive closure. Its 
syntax looks very different from N P’s however; it makes heavy use of scalar 
quantifiers in place of relational operators, and has no structuring mecha- 
nism akin to schemas. 


In practice, object models are much less powerful than relational models, 
since they are usually interpreted in terms of implementation constructs. So 
subset, for example, is usually construed as subclass. Since standard object- 
oriented languages do not allow objects to migrate between subclasses, this 
means that the members of a subset cannot change. Unlikein NP, thee 
fore, one could not introduce a set Callers that is a subset of Ph, corre 
sponding to the phones making calls, since this set changes over time and 
thus cannot bea subclass. 


4 Nitpick in Action 


Let’s see what N itpick can do with a specification. It basically offers two 
features: simulation and checking. Simulating a state schema produces in- 
stances of the state that satisfy the given constraints. Instructing N itpick to 
simulate Switch in Figure 1, for example, will cause it to output the rather 
uninteresting case: 


called: {} 
net: {} 
conns: {} 


Simulating an operation schema produces sample transitions. For Join, Nit- 
pick will produce 


p: P2 

n: N2 

called: {P2->N1} 

net: {} 

conns: {} 

called’: {P2->N1,P2->N2} 
conns’: {} 


exposing a simple flaw: we have allowed active calls to numbers that are not 
associated with phones. 


Checking a claim also causes instances to be generated. The claim JoinOK is 
about an operation, and the instances will thus be transitions. T hese in- 
stances, however, are counterexamples: they correspond to cases that show 
the property to be invalid. For JoinOK, Nitpick will attempt to find a tran- 
sition of Join from a state in which connsis injective to a state in which it is 
not, such as 


p: P2 

n: N2 

called: {P1->N1,P2->NO} 

net: {N1->P2,N2->P2} 

conns: {P1->P2} 

called’: {P1->N1,P2->NO0,P2->N2} 
conns': {P1->P2,P2->P2} 


T his counterexample demonstrates (at least) two flaws with the specifica- 
tion. In the pre-state, called maps a phone to anumber that is not in the 
domain of nat; that is, there is an active call to anumber not associated with 
a phone The ne function is not injective - there are two numbers mapped 
to the same phone; either this must be ruled out, or the definition of the 
operation must take it into account. 


5 Specification Analysis as Model Finding 


Both simulation and checking amount to the same problem: solving a rela- 
tional formula. Solving a formula means finding an assignment to its vari- 
ables that makes the formula true such an assignment is a modd of the for- 
mula. T he formula is extracted from the specification by simply undoing 
the syntactic shorthands; from J oinOK, for example, N itpick obtains the 
formula 


conns = called ; net 

and conns’ = called’ ; net 

and pin dom called 

and n not in ran called 

and called’ = called U {p -> n} 
and inj conns 

and not inj conns’ 


A couple of features of this formula are worth noting. First, note that in 
translating a claim, the formula is negated: for a claim, an assignment for 


which the formula is false is required, namely amodd of the formula’s ne 
gation. Second, since a state invariant constrains the states both before and 
after an operation, the inclusion of the schema Switch brings two subfor- 
mulae: one on the unprimed variables, and one on the primed variables. 
Third, observe the structure of the formula as a whole: a conjunction of 
(mostly positive) eementary subformulae. This is typical, and our method 
exploits it. 


T he free variables of such a formula may denote scalars, sets or rdations. A 
set of type declarations, not shown above, is associated with each formula. 
For example, the variable p is constrained to be a scalar of type Ph. In con- 
structing an assignment, therefore, we must first choose sets of values for the 
primitive types. O ur types are uninterpreted, so we use symbolic names for 
the dements: p1, p2, p3 for Ph, say. Given this set of 3 values for Ph, the 
variable p can take on 3 possible values. T he variable conns can take on 512 
values: there are 9 possible connections from a phone to a phone, each of 
which is present or absent. 


6 The Small Scope Hypothesis 


Every relational formula has infinitely many possible assignments. To make 
a finite search possible, we place an artificial bound by limiting the number 
of atoms in each type A scope S isa mapping from the type names of a 
specification to natural numbers; S(t) is interpreted as a bound on thenum- 
ber of atoms in the type t. We say than an assignment is within a scope, or 
that a scope admits an assignment, when the values of the variables in the 
assignment can be constructed using no more atoms than the scope allows. 
In our example, to consider only assignments involving at most 3 phones 
and 2 numbers, we would choose S(Ph) =3 and S(Num) =2. We shall say 
that a search has a scope of S when it considers only assignments within S. 
Sometimes we shall refer loosely to a “scope of 3”, which simply means that 
S(t) =3 for every typet. 

The problem of finding models of relational formulas is highly intractable, 
since the number of assignments grows so rapidly with both the scope and 
the number of variables. For a scope of k, a relation can have k’ edges and 
thus 2to the power k’ values, The size of the space is the product of the 
number of values of each variable, so for v relational variables, there may be 
2 to the power kv assignments. Increasing the scope from k to k+1 thus 
increases the space by 27", 


It therefore seems very unlikely that a method based on search will be able 
to handle large scopes. T his, perhaps, is why research on analysis of lan- 
guages such as Z and VDM has focused almost exclusively on syntactic and 
not semantic methods. 


The hypothesis underlying N itpick is a controversial one. It is that, in prac- 
tice, small scopes suffice. In other words, most errors can be demonstrated 
by counterexamples within a small scope Thisis a purely empirical hy- 


pothesis, since the relevant distribution of errors cannot be described 
mathematically: it is determined by the specifications people write 


O ur hope is that successful use of the N itpick tool will justify the hypothe 
sis. T here is some evidence already for its plausibility. In our experience 
with N itpick to date, we have not gained further information by increasing 
the scope beyond 6. 


A similar notion of scope is implicit in the context of model checking of 
hardware. Although the individual state machines are usually finite thede 
sign is frequently parameterized by the number of machines executing in 
parallel. This metric is analogous to scope; as the number of machines in- 
creases, the state space increases exponentially, and it is rarely possible to 
analyze a system involving more than a handful of machines. Fortunately, 
however, it seems that only small configurations are required to find errors. 
T he celebrated analysis of the Futureous+ cache protocol [C +95], which 
perhaps marked the turning point in mode checking’s industrial reputa- 
tion, was performed for up to 8 processors and 3 buses. The reported flaws, 
however, could be demonstrated with counterexamples involving at most 3 
processors and 2 buses. 


7 Intuitions fora Boolean Method 


For a finite scope, a relational formula can be translated into an equivalent 
boolean formula. T he idea is simple. W e represent relations as boolean ma- 
trices, each bit corresponding to a possible edge in the relation. A solution 
to the boolean formula is translated into a solution to the rdational formula 
by constructing the relations whose edges are present or absent according to 
the values of the bits. 


The translation can be performed compositionally. In general, a relational 
term is translated into a matrix of boolean formulae. For each relational 
variable, we introduce a matrix of boolean variables; each term is then 
translated by composing the translations of its subterms. 


Suppose, for example, that we have translated a term p into a matrix [p], 
and aterm q into a matrix [q]. The eement in the ith row and the jth col- 
umn of [p], which we write[p];, is a boolean formula that is interpreted as 
being true when p maps the ith element of its domain type to the jth de 
ment if its range type. T he translation of the term pm q isgiven by 


[pO qli = [pli 4 [pli 


since an edge isin the intersection of two relations when it isin both rela- 
tions. For each of the relational operators there is such a rule. 


Elementary relational formulae are translated into boolean formulae The 
rule 
[Ipcoq] = Aq [pli = [ali 


for example, states that a relation p is a subset of arelation q when there is 
an edge in q corresponding to every edge in p. 


Sets and scalars are treated as degenerate relations. A set-valued term is 
translated into a vector of boolean formulas; the rules for set intersection 
and subset are just like the rules shown above, but with one index omitted. 
Scalars are represented as singleton sets; when a scalar variable is translated, 
a side-condition is created that asserts that exactly one of the dements of the 
Set is present. This side condition is conjoined with the formula obtained 
from translation right at the end. Similar side conditions are generated to 
express properties of relations given in their declarations: that a relation isa 
function, or is total, etc. 


N itpick is thus, in essence, a compiler. Various syntax and type checks are 
applied to the specification. Given the user’s selection of a schema to simu- 
late or aclaim to check, arelational formula is extracted. T his formula is 
then compiled into a boolean formula, which is then presented to a boolean 
satisfier. By maintaining the appropriate associations between the relational 
variables and the boolean variables, the tool can then translate boolean as- 
signments back into relational assignments. 


N ote that the major compilation step - obtaining the boolean formula - is 
scope dependent, since the dimensions of the boolean matrices are deter- 
mined by the number of elements in the basic types. Each time the user 
changes the scope, the formula must be recompiled into anew boolean 
formula, which is then presented to the solver. 


From a design point of view, it seems desirable to implement boolean for- 
mulae with an abstract data type The translator would then call operations 
on the type to construct compound formulae rather than meddling directly 
with their syntax. The solver itself might be encapsulated within the abstract 
type, as an operation of type 


BooleanFormula > BooleanAssignment. 


To change the representation of formulae, or to choose a different solver, 
we would then need only to replace this abstract type. U sing subclassing, 
and perhaps the Factory design pattern [G +95], we could even support 
these selections at runtime. 


W ere thisscheme viable, our paper would be very short. U nfortunately, 
however, it turns out that the choice of solver is not a concern that can be 
easily separated from the rest of the design. Few solvers will accept a boo- 
lean formula with an arbitrary structure the ones that appear to be most 
efficient for our problem require the formula to bein conjunctive normal 
form. A naive translation produces such huge formulae that the translator 
seizes up on even tiny examples. 
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T here are three basic tricks N itpick uses to reduce the size of the formula. 
The first isto decompose it into a digunction of formulae, each itself in 
conjunctive normal form. Rather than attempting to construct a single huge 
formula, these formulae are solved independently; a solution to any oneisa 
solution to the formula as a whole. T he second is to introduce temporary 
variables; this is a tradeoff, since although it can reduce the formula size 
substantially, it can also make the formula harder to solve. The third is to 
find opportunities for decomposition of a formula into a disjunction in 
which several of the diguncts can be discarded. 


Each of these tricks exploits the structure of the relational formula. Thede 
composition into a digunction follows natural cases in the specification it- 
self. The introduction of tenporaries is applied only when translating rela- 
tional terms that tend to cause a blowup. Finally, the discarding of diguncts 
relies on observing certain symmetries in the structure of the relational for- 
mula. 


Loss of modularity is perhaps inevitable After all, it would be foolish not to 
exploit the structure of the problem at hand to the greatest extent. A naive 
translation yields a boolean formula in which this structure has been lost. It 
might, of course, be possible to recover it, but it makes more sense to ex- 
ploit the structure where it is manifest: in the translation process itself. 


8 Structure of the compiler 


N itpick is structured like a compiler (Figure 3). In the first phase, the speci- 
fication is parsed and superficial syntactic checks are performed. Schema 
references are expanded, and for each schema or claim a relational formula 
is obtained, represented as an abstract syntax tree with symbol tables hold- 
ing, for example, type information about variables. T hese formulas are from 
now on treated independently; a series of translations is applied to each. 


In the second phase, each relational formula is converted to disjunctive 
normal form. Thisis a purely logical transformation that treats the elenen- 
tary relational formulae as uninterpreted. T he result of this phase is a set of 
sets of elementary relational formulae. From here on, each set is treated as 
an independent problem, from which a single boolean formula will be gen- 
erated. T hese problems are translated and then solved in turn. 


In the third phase, the relational subformulae themselves are compiled into 
a more primitive relational calculus with fewer operators. In the fourth 
phase, temporary variables are introduced and opportunities for symmetry 
breaking are identified. Finally, in the fifth phase, the boolean formula is 
generated, in conjunctive normal form. T his formula is then presented to 
the solver. Either the solver fails to find a solution (usually but not necessar- 
ily because the formula is unsatisfiable), or it returns an assignment to the 
boolean variables that makes the boolean formula true From this assign- 
ment, an assignment to the relational variables is easily constructed (using a 
mapping generated as a byproduct of the translation), and is displayed for 
the user. 
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T ype checking is currently performed after the third phase on the interme 
diate language. T his has the advantage of greatly simplifying the type 
checker, but it does result in more cryptic error messages than would be 
obtained by analyzing the source code directly. In a production version of 
the tool, it would be better not to postpone type checking. 


Subsequent sections elaborate on these phases in turn. 


9 Relational Disjunctive Normal Form 


The motivation for translating into D N F is two-fold: to decompose the 
checking problem, so that checking can be performed more efficiently, and 
to provide more useful feedback to the user. 


A simplified version of the input language is given in Figure 4. W eignore 
the schema structuring, which affects only the design of the front end, and 
assume that a claim to be checked or schema to be simulated has been ex- 
panded into a relational formula. T his formula language is essentially a first- 
order subset of Z. In the input language, the operators are written in ascii 
form; they appear in ascii here to make the distinction between the input 
language and the intermediate language (discussed later) clearer. 


Disjunctive normal form (DNF), which we shall refer to as rdational D NF 
(to avoid any confusion with the normal form we discuss later on boolean 
formulae) is a subset of the formula language, in which negation can only be 
applied to eementary formulae, and only formulae not containing disjunc- 
tions may be conjoined. In other words, the formula is represented as a 
disjunction of conjunctions of literals, where each literal is an eenentary 
relational formula or its negation. 


Putting an arbitrary formula in normal form can cause an exponential 
blowup. In practice, though, our specifications rardy generate formulae 
with more than a handful of disjuncts. D isjunction arises primarily from 
two idioms. 


In operation specifications, disjunction arises from case splitting. Suppose 
for example, our Join operation (Figure 1) is to handle the case, not previ- 
ously specified, in which the number being called has already been called. 
T he operation might then be written: 


Join2 (p : Ph; n: Num) = [ 
Switch 
| 
p in dom called 
((n not in ran called => called’ = called U {p -> n}) 
and (nin ran called => called’ = called)) 


] 


which says that if the number has already been called, the operation has no 
effect; otherwise the behaviour is as before. W hen asked to simulate this 
operation, N itpick will convert the specification to DNF, obtaining two sets 
of dementary relational formulae, the first corresponding to one case: 
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formula ::= 

formula and formula 
| formula or formula 

| not formula 

| formula => formula 
| elemformula 


elemformula ::= 


expr <= expr relation or set subset 
expr < expr relation or set proper subset 
| expr in expr set membership 
| expr = expr relation, set or scalar equality 
expr = 
expr U expr union of sets or relations 
expr & expr intersection of sets or relations 
expr \ expr difference of sets or relations 
expr (+) expr relational override 
expr ; expr relational composition 
expr ~ relational transpose 
expr . expr relational image or function application 
expr + transitive closure 
expr * reflexive and transitive closure 
dom expr domain of a relation 
ran expr range of a relation 
expr <: expr domain restriction 
expr :> expr range restriction 
expr <; expr domain subtraction 
expr ;> expr range subtraction 
Un universal relation constant 
Id identity relation constant 
{} empty set or relation constant 
{ expr, ..., expr } set constructor 
{ expr -> expr, ...} relation constructor 


Figure4: A syntax of the formula sublanguage of N P 


conns = called ; net 

p in dom called 

not nin ran called 

called’ = called U {p -> n} 


and the second corresponding to the other case: 


conns = called ; net 
p in dom called 
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nin ran called 
called’ = called 


T hese are subsequently treated as separate problems; the tools will attempt 
to find amode for the first, and then for the second. As we shall see later, 

avoiding top-level disjunction has a dramatic effect on performance. 

M oreover, the results obtained by this decomposition are exactly what the 
user would expect, since the two subproblems correspond to separate cases 
in the specification itself. 


The second idiom from which digunction arises occurs within aclaim. The 
specifier might assert that an operation satisifes two properties. For example, 
we might want to say not only that Join preserves the invariant that conns is 
injective, but that it also preserves the invariant that no phoneis both 
making and receiving a call at once This latter invariant can be defined in 
its own schema 


OneRole = [Switch | dom conns & ran conns = {}] 
and we can then formulate the elaborated claim as 


JoinOK1 (p : Ph; n: Num) :: [ 
Switch | Join (p, n) and inj conns and OneRole => inj conns’ and OneRole’] 


This claim is, incidentally, weaker than the combination of the old claim 
(JoinO K) and a separate claim for maintenance of the new invariant, such as 


JoinOK2 (p : Ph; n: Num) :: [ 
Switch | Join (p, n) and OneRole => OneRole’] 


since, by asserting that the conjunction of the invariants is maintained, its 
hypothesis includes both invariants; one invariant may thus contribute to 
the preservation of the other. 


The result of DNF conversion of J oinO K1 is two cases, one for violation of 
the first invariant: 


conns = called ; net 

p in dom called 

not nin ran called 

called’ = called U {p -> n} 
inj conns 

dom conns & ran conns = {} 
not inj conns’ 


and another for violation of the second: 


conns = called ; net 

p in dom called 

not nin ran called 

called’ = called U {p -> n} 

inj conns 

dom conns & ran conns = {} 
not dom conns & ran conns = {} 


15 


Again, checking these separately is a big performance win, and gives useful 
information to the user: namely which invariant is broken. 


Elaborating both the claim and the operation, so that the claim is now 


JoinOK3 (p : Ph; n: Num) :: [ 
Switch | Join2 (p, n) and inj conns and OneRole => inj conns’ and OneRole’] 


would yield four analysis problems. W hen the checker finds a counterex- 
ample, it will indicate which branch of the operation violates the property 
and which property is violated. 


Technically, the consequent of the implication of an apparently simple 
claim such as] oinO K2 is actually a conjunction, since the schema reference 
OneRole is expanded into the conjunction 


conns = called ; net 
dom conns & ran conns = {} 


A naive translation would yield a case such as 


conns = called ; net 

p in dom called 

not nin ran called 

called’ = called U {p -> n} 
dom conns & ran conns = {} 
not conns = called ; net 


which is obviously not satisfiable. To avoid generating such spurious cases, 
the tool performs some basic simplifications during conversion to DNF, so 
that no conjunct contains both an edementary formula and its negation. 


10 Intermediate Relational Language 


The second major transformation is applied at the level of the relational 
operators, within the elementary subformulae. | ts motivation is to simplify 
the code of the subsequent transformations. T he result of this step, which 
can be viewed as an intermediate language, is a formula with the same 
structure, but with relational expressions expanded to compensate for a 
more frugal repertoire of relational operators. 


T he operators of this intermediate language are listed in Figure 5. They in- 
clude the three set-theoretic operators (union, intersection and difference) 
that can be applied to sets and relations; the two quintessentially relational 
operators (composition and transpose); and three constants (the universal, 
identity, and empty relations). T hese, along with the equality operator for 
obtaining formulae from terms, together constitute a basic relational lan- 
guage equivalent to T arski’s relational calculus [Giv88, T ar41]. T his lan- 
guage is as expressive as first-order predicate logic (so long as we permit 
definition of projection functions so that tuples can be constructed), and is 
thus undecidable [Sch79]. 


T he language includes, additionally, the transitive closure operator, which 
extends its expressiveness, and is indispensable in practice. We also add two 
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formula ::= 


expr = expr equality 

| expr c expr subset 

expr = 
Id identity relation constant 
Un universal relation constant 
0) empty relation constant 
expr U expr union 
expr om expr intersection 
expr \ expr difference 
expr ; expr relational composition 
expr ~ transpose 
expr + transitive closure 
expr <: expr domain restriction 
dom expr domain 


Figure 5: Intamediate Language 


operators that add nothing in a formal sense, but which allow more efficient 
generation of boolean formulae. T hey are the domain operator, which takes 
a relation to the set of dements that it maps, and domain restriction, which 
given a set and arelation, produces the subrelation whose pairs have first 
elements in the set. W e view these as the ‘bridging operators’ that connect 
relations and sets. 


T he language's typing rules and semantics are given in Figures 6 and 7. M 
and E are the meaning functions for formulae and expressions respectively; 
C mapsa type to its carrier set. Each operator is given a meaning in terms of 
naive set theory. A relation is viewed as a set of pairs; the union of two rela- 
tions thus becomes, for example, the union of the two sets of pairs. A set is 
not viewed directly as a set, but rather as a relation whose range type is the 
special type U nit, consisting of exactly one atom unit. To encode a set with 
this representation, we simply construct a relation that includes the pair (¢ 
unit) for each element e of the original set. 


The constants (universal, enpty and identity) are to be considered as in- 
dexed sets of constants, each with a different type So in an expression such 
as (Un - p), the appropriate instance of the constant will be chosen, whose 
type matches the type of the relation p. 


W e chose to represent sets as relations because it corresponds to the natural 
boolean representation of a set as a bit vector. T he result is a cleaner back- 
end, in which set-theoretic operators can be treated identically for the set 
and relation cases. 


17 


Id: TOT 


Un: SoT 


O:S0OT 


pPSOT,q: SOT 


pUq:SoT 


pPSOT,q: SOT 


pAq:SeT 


pPSOT,qGq: SOT 


p\q:SoT 


pSoT,q: Tov 


p;q:SoVv 


piSsSoT 


p~:Tos 


pilowt 
p+: Tort 


p: So Unit, q: SOT 


p<:q:SoT 


piSoT 


dom p: S$ Unit 


Figure6: T yping rules for the intermediate language 


A more elegant treatment of sets [SS93], which we have used to justify re 
ductions in our explicit checker [}]D 97], represents the set sas the relation 
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corresponding to the cross product of sand the universal set. In this scheme, 
no additional operators are required: the domain of a relation, for example, 
is obtained by composing the relation with the universal relation. In prac- 
tice, however, it has a major disadvantage. T he source language expression 


s<ir 


where sis aset and risa relation, would be translated into the intermediate 
language expression 


Sar 


where § is the relation corresponding to (sx Un). What is the type of S in 
this expression? Its domain type will be the type of the dements in the 
original set s. Its range type, however, must be the type of the range of r. 
Since the type of sin the source expression places no constraint on the range 
type of r, we must equally allow 


Saq 


where q has a different range type from r. The relation S must therefore be 
regarded as polymorphic in its range type. T his complicates the backend of 
the compiler far more than the addition of a few extra operators to the in- 
termediate language. 


T ype declarations in N itpick, asin Z, may involve implicit constraints. A 
relation may be declared to be a function, or to be total, for example. These 
constraints are translated into the operators of the intermediate language. 
For example, the assertion that the relation p is a function can be expressed 
as 


p~; pcld 


Each such elementary formula extracted from a declaration is conjoined to 
every clause in the DN F representation, since type constraints apply in 
every case. 


For example, the clause 


conns = called ; net 
conns’ = called’ ; net 

p in dom called 

n notin ran called 

called’ = called U {p -> n} 
inj conns 

not inj conns’ 


with type declarations 
p: Ph; n: Num; called: Ph <-> Num; net: Num -> Ph; conns, conns’: Ph <-> Ph 


is translated into 


conns = called ; net 
conns’ = called’ ; net 
p € dom called 

ane dom called~ 
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called’ = called U(p <: (n <: Un)~) 
conns ; conns~ cld 
3 conns’ ; conns’~ cld 


with type declarations 
p: Ph <-> Unit; ; n: Num <-> Unit; 
called: Ph <-> Num; net: Num <-> Ph; conns, conns’: Ph <-> Ph 


and additional type constraints 


net~ ; net cld 


11 An Interlude: Boolean Conjunctive Normal Form 


To motivate the last two stages of compilation, we must take a short digres- 
sion to explain the normal form in which the boolean formula is finally cast, 
since its structure and properties motivate the design of these stages. 


The boolean formula is represented in conjunctive normal form, that is, asa 
conjunction of disjunctions. A formula is a set of clauses; a clauseis a set of 
literals; a literal is a boolean variable or its negation. H ere are some examples 
of formulae and their representation in CNF: 


anb {fa}, {b}} 

avb {{a, b}} 

a>b {{-a, b}} 

aob {{-a, b}, {a, -b}} 


In acompositional translation to conjunctive normal form, we will need to 
know, given two formulaeF and G in CNF, how to create the CN F for- 
mula for F AG, F v G, —F, etc. Conjunction is easy; the clauses of F ~ G 
are just the clauses of F and the clauses of G. Digunction, on the other 
hand, is harder. From the identity 


(anb)v(c ad)=(avOa(avaa(bvaa(bvd) 


we see that the clauses of F v G are obtained by forming the cross-product 
of their individual clause sets. So although the sizeof F ~G isat most the 
sum of the sizes of F and G, the sizeof F v G may beas large as the product 
of their sizes. 


N egation is even worse; in the worst case it produces an exponential 
blowup. Applying de M organ’s laws to our sample expression, we get 


a (av b) a (cv d)) 
=a(avb)via(cvd) 
= ( 1A A ib) v ¢ iC A id) 


In general, the CNF of composite formulae is obtained according to these 
rules: 
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MI[p cq] = V a,b. (a,b) € E[p] = (a,b) € Ela] 

M [p = q] = V a,b. (a,b) € Elp] = (a,b) € Ela] 
E[lId: T oT] ={ (a,a) | ae C[T]}} 

E [0] ={} 

E[Un: ST] ={ (ab) | ae C[S] av be CT} 
E[p Uq] ={ (a,b) | (a,b) € Elp] v (a,b) € Elq]} 
E[p mq] ={ (a,b) | (a,b) € Elp] « (a,b) € Elq]} 
E[p \ q] ={ (a,b) | (a,b)  Elp] ~ (a,b) ¢ Elq]} 
E[p ; q] ={ (a,b) | Ac. (a,c) € Elp] ~ (c,b) € Elal} 
E [p~] = { (a,b) | (b,a) € Ep} 

E [p+] = the smallest x such thatx ;xc XA pcxXx 
E[s <: p] ={ (a,b) € E[p] | (a,unit) € E[s]} 


E [dom p] = { (a,unit) | db. (a,b) € E[p]} 


Figure 7: Senantics of intermediate language 


[FAG] = [F] U[G] 
[FyG] ={fug|fe [Flage [G}} 


[HF] = {c c Vars(F) | V f e [F]. a! x.xe fa axe f} 


12 Translation to Boolean Conjunctive Normal Form 


U nderstanding the penultimate step depends on understanding the final 
step, so we explain the latter first. T he penultimate step involves no change 
of language its input is a relational formulain DNF, and its output is an- 
other relational formulain DNF. In the final step, each DNF clauseis 
translated into a boolean formulain CNF. 


A relation value can be represented as a boolean matrix, with truein the ith 
row and thejth column exactly when the ith eement of the domain type is 
related to the jth element of the range type. So a variable denoting a relation 
can be represented as a matrix of boolean variables, and, in general, a rela- 
tional expression can be represented as a matrix of boolean formulae. It is 
easy to define a compositional translation that obtains the representation 
X[e] of a relational expression e from the representations of its parts. Rela- 
tional composition, for example, corresponds to matrix product: 


X[p ; qli = ve X[plix a X[q]is) 
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The full set of translation rules appears in Figure 8. No rule for transitive 
closure appears; as we shall see, the penultimate step (described in the next 
section) eliminates it. 


For each elementary subformula, a single boolean formula is derived. W rit- 
ing B[F] for the boolean formula derived from rdational formula F, we 
have, for example, 


Blpcq] =A A X[pli = X[q]i) 


which, in terms of graphs, simply says that a rdation p is a subset of a rda- 
tion q when the presence of any edge in p implies its presence in q. Note 
how equalities and inequalities translate smoothly into CN F: each element 
X[p], contributes a clause. The number of clauses thus rises linearly with the 
number of elements (and thus quadratically with the scope), although the 
clauses themselves grow more rapidly. T his observation is a major motiva- 
tion for the choice of CNF. 


Since the relational formula has been converted to DNF, with each con- 
junct being regarded as a separate problem to solve, there are only two logi- 
cal operators on relational formulae: negation and conjunction. Each is 
translated directly: 


BI-=F] = —B[F] 
B[F A G] = B[F] 4 B[G] 


Conjunction is well behaved: each new elementary relational formula adds a 
new set of clauses, so that the number of clauses in the final formula is lin- 
ear in the size of the specification. N egation, as we have seen, however, isa 
major problem, and we shall explain below how its effects are minimized. 
Fortunately, the DNF clauses derived from practical problems tend to in- 
volve few negated formulae (rarely more than one). 


W e explained above (Section 10) how typing constraints are translated into 
elementary relational formulae, and subsequently have no special treatment. 
In our previous work [DJ] 96], in which we used binary decision diagrams 
to solve the satisfaction problem, we took a different approach, in which the 
encoding of a variable exploits its type constraints. 


Consider representing a total function to a set of 4 elements. Its value, seen 
as a bit matrix, must have exactly one bit true in each row. If, in the en- 
coding of the variable, the row is represented as 4 separate boolean variables 


< bo, bi, bz, b3 > 
this constraint must be expressed as a side condition 


bo v bi v b2 v bs 

A bo = —=bi A ab2A abs 
A bi = boa ab2A abs 
A b2 = —=bo a abi abs 


A b3 => —bo a mba —b2 


22 


Blpcq] =A A &lpli = XI ai 
Blip = q] =a ai Xlpli = Xl 
X [Idli = G = j) 

X [O]i, = false 

X [Un] = true 

X[p vu gli = Xlpli v Xai 

X [po gli = Xlpli a Xla]i 

X[p \ qli = Xlpli 4 aXIq]i 
X[p 5 dla = ve X[plix A Xq]k 

X [p~]i = XI pli 

X[s <: pli = X[p]ii a XIslio 


X [dom plio = ve X[plix 


Figure8: T randation from rdational to boolean form 


Since each row in the matrix represents one of 4 possible values, only two 
boolean variables should be necessary. Viewing the pair of variables b,b, asa 
binary number, we can instead represent the row like this 


< abo A bi, abo A bi, bo A Abi, bo A bi > 


in which theith entry is the assertion that the binary number b,b, denotes 
the integer i. This encoding reduces the number of boolean variables re 
quired from n to log n, and dispenses with side conditions. It applies 
widely, since not only functions but also scalars (which are treated as sin- 
gleton sets) can be encoded in this manner. 


In our BDD method, which was very sensitive to the number of variables, 
this gave a dramatic improvement in performance. But for the current 
method, these more sophisticated encodings perform considerably worse 
than simple encodings. In CN F, each entry in a row such as that shown 
above will consist of two clauses rather than one. T his small difference is 
amplified by disjunction. 


Consider, for example, translating the composition of two rdational vari- 

ables in a scope of k (that is, in which all domain and range types contain k 
elements). Each entry in the resulting matrix is the result of a digunction of 
k terms. In the simple encoding, each term consists of two singleton clauses, 
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so the entry can have 2* clauses. In the more sophisticated encoding, each 


term consists of (2 log k) singleton clauses, so the entry isa factor of (log k)° 
larger. 


T hese calculations are borne out by our experiments. W e implemented both 
encodings. T he clever encoding not only brought no improvement, but in 
many cases caused the formula to outgrow available memory so rapidly that 
the formula could not be generated at all. 


At alower level, acrucial consideration ishow the CN F formula is repre 
sented and whether simplifications are applied during translation. If a for- 
mula contains a clause and one of its subsets, the clause can be discarded; 
this is known as subsumption and follows from the fact that any mode! of F 
is also amodedi of F v G. Eliminating redundant clauses early on can have 
an enormous effect on the size of intermediate formulae. 


O ur prototype uses a trie- based representation developed by Zhang and 
Stickel [ZS94]. Using tries naturally eliminates some redundancy - since 
literals in clauses are lexically ordered, and prefixes are shared, it is easy to 
ensure that a clause never appears with one of its prefixes. The trie also sup- 
ports a simple and efficient implementation of the D avis-Putnam satisfi- 
ability algorithm [D P60]. 


A huge performance gain is obtained by negation caching. A direct imple 
mentation of the translation rules will sometimes cause the negation of a 
formula to be negated. Consider, for example, translating 


t=p;q 
where p and q are relational variables. Each element of the matrix repre 
senting the composition is a disjunction 


vi E[p]ix A Ela] 


which, in CNF, will require 2° clauses of 2 literals each. Representing an 
equality formula causes negation of boolean formulae on both sides 


xX =F 
=(x > F) A (F => x) 
= (4X v F)A GF v x) 


So the negation of these elements will be required. Directly negating an 
elenent formula causes a second blowup, so a naive translation will result in 
a huge formula. But the negation of the dement formula is its dual 


Ak E[p]ik v Ela] 


and this formula has a small CN F representation: k clauses of 2 literals. 
When the dement formula is computed, the compiler therefore computes 
its negation at the same time, and caches it; when the negation is later re 
quired, the cached formula is used, and no computation is performed. Ad- 
ditionally, whenever a formula’s negation is computed, the original formula 
is cached as its negation, so that no formula is negated twice. 
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13 Introduction of Temporary Variables 


W enow turn to the phase that precedes the final one. It involves a simple 
manipulation of the intermediate language formula, designed to minimize 
the insidious effects of disjunction when the boolean formula is subse. 
quently generated. 


This manipulation is nothing more than the introduction of fresh variables 
to replace relational subexpressions. Recall that the relational formula is in 
disjunctive normal form, each clause of which is treated separately. Suppose 
we havea formula that contains the subexpression e. Without changing the 
meaning of the clause, we can replace e by a fresh variable v, and add to the 
clause the equality v =e By maintaining a set of replaced expressions, we 
avoid introducing unnecessary new variables; if eis to be replaced in another 
context, the same variable v is used. T his brings the standard benefit of 
common subexpression elimination: each subexpression is only translated 
once. But its primary motivation is that introducing new variables can have 
a dramatic effect on the size of the final boolean formula. 


D ecisions about where to introduce variables are currently made according 

to some simple heuristics that seem to work well in practice. Although they 
seem plausible from a theoretical point of view, we have not shown them to 
be optimal, and it is likely they could be improved considerably. 


A disjunction of k boolean terms each with m clauses can result in a formula 
of sizem‘. Variables are introduced when either m or k is large. We shall 
consider the cases in order of increasing importance. 


e Theleast dramatic case, but one that nevertheless merits variable intro- 
duction, arises when each element of the resulting matrix is obtained by 
combining, with disunction, an eenent from each of two matrices. In 
translating the union expression p U q, we must therefore ensure that 
elements of the matrices resulting from translating p and q do not con- 
tain many clauses. N ote that if p and q are relational variables, the de 
ments of X[p U q] will still only contain a single clause. Composition 
and intersection, on the other hand, always create elements with multi- 
ple clauses. So our heuristic is to replace any expression that appears in 
aunion expression that is itself a composition or an intersection expres- 
sion. 


e A similar issue arises at the elementary formula level. Translating p c q 
also involves an element-wise disunction, so we replace either p or q 
with a variable when both are expressions involving composition or in- 
tersection. 


e Translating relational composition gives rise to a disjunction of k terms, 
where k is the length of arow or column (ie, the scope). Each of these 
terms is itself the result of a conjunction, so composition is always ex- 
pensive. All subexpressions therefore, unless trivial (a variable or the 
transpose of a variable), are replaced by variables when appearing in a 
composition expression. 
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The most dramatic effect of disjunction arises for negated formulae. C on- 
sider, for example, the dementary formula 


pz#0 


where p is some relational expression. T his translates to a digunction of kK’ 
terms for a scope of k. In practice, unless p isa variable, the translation is 
usually infeasible. So for any negated formula, we replace the expressions on 
both sides with variables. In this case, the result is a formula with just one 
clause, but for other negated formulae the result is usually still large The 
translation of 


apcq 


for example, has 2 to the power k’ clauses when both p and q are variables. 
For any scope above 3, this is infeasible, so this case is given special treat- 
ment (see Section 15). 


14 Transitive Closure 


Translating transitive closure presents some special problems. T he transitive 
closure of a relation p, p+, is the smallest relation r that includes p 


per 
and is transitive 
rorer 
and can be computed by the series 


pUu(p; PUM ip; py... 


Since p must be homogeneous, we can view it as a graph with one node for 
each element of the domain type of p, and an edge from node a to node b 
whenever p relates a to b. Two nodes are associated by (p ; p) if thereisa 
path of two edges from one to the othe; by (p ; p; p) if thereisa path of 
three edges, and so on; and by the closure if there is a path of any length 
between them. 


Computing the closure of arelation can thus be viewed as constructing 
paths in such agraph; in each step we compute a set of longer paths, and 
stop when we reach the fixpoint in which no path can be lengthened. H ow 
many steps might this take? If there are n nodes in the graph, a pair of nodes 
can either be connected with a path of length n or less, or not at all. It fol- 
lows that a smple iterative computation will require at most n steps, each 
involving a union and on average n/2 compositions. 


A more efficient way to translate closure is by the series: 
Po=p 
pis1 = (pi; pi) U pi 


Since any path of length no greater than 2k can be decomposed into two 
paths each of length no greater than k, it follows by induction that the ith 
approximation in this series will associate nodes that are connected by a 
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path of at most 73 edges. In this case, however, we will reach convergence in 
log n steps, This techniqueis known in model checking [BC +92] as iterative 
Squaring. 


O ne way to encode closure is to apply the translation rules for composition 
and union on the fly, translating the approximations p,. For exactly the rea- 
sons explained above (in Section 13), this will not generally be feasible: the 
unions lead to disjunctions of increasingly large formulae. So rather than 
using the series to evaluate the closure in boolean form, we use it to expand 
the closure expression syntactically. By performing this unwinding just prior 
to the stage in which variable introduction occurs, the effects of the dis- 
junctions are mitigated as for any other complex expression. 


Although this works well and is easy to implement, it does have one unde 
sirable consequence. The compilation becomes scope dependent in an ear- 
lier phase. W ithout closure, only the final translation to a boolean formula 
depends on the scope; now variable introduction becomes dependent too. 
In practice this is a minor annoyance, since the bulk of the compilation 
time isin the final phase 


15 Symmetry Breaking 


For relational formulae involving only the set operations (union, intersec- 
tion and difference), the boolean formula grows only with k’ for a scope of 
k. Composition, unfortunately, introduces a factor of 2‘. But far worse is 
the effect of certain negations: the formula 


pq 
is equivalent to 
vi 7 Elplii © Elli 


which results in a disjunction of k* terms, resulting in a boolean formula of 
2 to the k’clauses. Generating such a formula is infeasible for k > 4. 


Such formulae arise primarily in two places. O perations often have precon- 
ditions that assert that a scalar is not in a set; this becomes an inequality on 
relations, but of dimension 1 x k, so the blowup is no worse than for a 
composition. M ore seriously, it is common to assert that an operation leaves 
a state component unchanged. T his gives a formula like 


op ap’#p 
where op is the specification of the operation. A similar situation arises for a 


claim that an operation preserves an invariant that one relation is a subset of 
another. 


To address this problem, we apply some ideas from our previous work on 
symmetry [}DJ 96, })D 97]. As we have shown, the mode's of relational for- 
mulae are permutation invariant: given amodel of a formula, the assignment 
that results from permuting the atoms of the underlying universe will also 
be amode. Consider now a model of the formula 
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FAp#q 
in which for some particular values of i and j, 


+ Elphi = Elali 
Applying the permutation that mapsi and j both to 0, we obtain amode in 
which 

— E[ploo = Elq]oo 
is true Wecan therefore replace the disjunction 

vi 7 Elpli  Elali 
by the single tem 

— E[p]oo = Elq]oo 
which, if p and q are variables, reduces the boolean formula to a single 
clause! 


This argument has some subtleties. If p isa homogeneous relation, its indi- 
ces cannot be permuted independently. Consequently, we have to includea 
diagonal and an off-diagonal element. In general, as we plan to explain ina 
forthcoming paper, more than one negated equality (or inequality) can be 
reduced in this way, so long as each reduction exploits permutation of a 
different type. The prototype tool ranks the negated subformulae according 
to their severity, and then allocates symmetry breaking reductions from the 
most to the least severe, making sure never to break symmetry on the same 
type twice. Fortunately, there is usually only one negated subformula on full 
relations, so the method works remarkably well. 


N egative equalities can also be translated into digunctions that are handled 
separately, in the same manner as top-level disjunctions. The formula 
pq 
is equivalent to 
Gpcadvoqep) 
T he negated inequality 
ips 
can then be rewritten as 
pan \ g4} 
which can be simplified with variable introduction to 
t=po(Un\ q) 
tz 


thus eliminating subsequent translation steps that would involve any more 
than elementwise operations on the boolean matrices. Problematic elenen- 
tary formulae that cannot be handled by symmetry breaking (because all 
types have been ‘consumed’) might be handled in this way. 
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16 Solving the Boolean Formula 


We have experimented with two solvers. We wrote a D avis-Putnam solver 
[D P60] using Zhang and Stickel’s trie based representation of formulae 
[ZS94] in Java as part of the prototype. W e also wrote code to generate in- 
put files for Selman and K autz’s WalkSAT solver [SKC 94], a descendant of 
GSAT [SLM 92]. 


The two solvers are very different. W alkSAT, unlike D avis Putnam, isin- 
complete: it may fail to find a satisfying assignment even though one exists. 
Also, because it starts from a random assignment, WalkSAT cannot be 
made to produce the simplest solutions first; in our D avis-Putnam imple 
mentation, by favoring false over true in the case splitting step, we are able 
to bias the solver towards small relations. On the other hand, WalkSAT can 
handle much larger problems. W e found that D avis-Putnam often stops 
working when thenumber of boolean variables exceeds a few hundred. 
WalkSAT, on the other hand, isso fast that the time it takes to solve the 
formula is invariably dominated by the time the compiler took to construct 
it. 

W e have not found the incompleteness of W alkSAT to bea problem: we 
have yet to come across a problem that we believed had a solution but 
which WalkSAT failed to find. 


Both methods have a desirable performance property. W hen there is a solu- 
tion, it tends to be found very quickly; when there isn’t, the solver can run 
for a very long time The engine underlying our previous implementation of 
N itpick [J)D 97] did not have this property: in many cases, successful and 
unsuccessful searches took roughly the same amount of time (within a fac- 
tor of 2 to 10). Of course we would like the solver to work fast whatever the 
outcome, but if forced to pick, a very uneven distribution biased towards 
yielding solutionsis, in practice, much better. 


17 Performance results 


Some performance results for a variety of specfications are shown in T able 

1; the specifications are reproduced, mainly in full, in the Appendix. The 
first two examples are toy benchmarks; Phone is the specification of Figure 1 
(but for a different claim, invB_ preserved) and Finder isa specification of the 
directory structure of the M acintosh Finder. 


Style is a specification of the paragraph style hierarchy of M icrosoft W ord, 
described in detail in [JD 96a]. 


Allocate is a simplified fragment of a railway interlocking specification writ- 
ten in Z by PraxisUK PLC. The performance figures are for the claim Al- 
locSafe2. 


M obile|P is a specification of a draft version of a mobile internet protocol 
for |Pv6, written by an undergraduate at C arnegie M ellon [JN W 97, Ng97]. 
T he performance figures are for the claim loc_update OK, which exposed a 
flaw in the protocol. 
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T able 1 is to be interpreted as follows. The column marked Cases gives the 
number of clausesin the D NF representation of the claim; F ormulae gives 
the maximum number of relational formulae in each clause. |n the analysis 
of Style, for which there were 12 clauses, a mode! was always found in the 
first clause; the other columns refer to this clause alone 


The columns marked Vars and Clauses give the number of boolean variables 
and clauses in the emitted formula; T randate and Solve give the times to 
perform the translation and solve the boolean formula. All formulae were 
solved using W alkSAT , except when DP appears in parentheses after the 
solving time for these, the built in D avis-Putnam solver was sufficient (ie, 
found a solution in less than 30s). All timings were obtained using Sun- 
Soft’s just-in-time compiler, running under Windows NT on aPentium 
133M Hz processor with 64M B of memory. All times are wall clock times; 
no attempt was made to get precise numbers for times less than 1 second. 


The final column gives the time taken by the explicit version of N itpick, 
coded in C and running on aM acintosh with a 66M Hz PowerPC 601 
processor and 24M B of RAM. Since this machine is perhaps 3 times slower 
than the Windows machine, the solving times for the new method should 
be multiplied by 3. (T he translation times do not need the same adjust- 
ment, because of a compensating discrepancy between J ava and C). 


The entry ?? indicates that a model was not found in a reasonable time, we 
set a bound of one hour. The explicit checker did find a model for M obile 
IP, but for a smaller scope in which different bounds were associated with 
different types. T his scope could not be checked with the new method, be 
cause the prototype currently allows only scope settings that give every type 
the same bound. 


Ascan beseen, the new method outperforms the old method in almost all 
cases. It seems likely that it will be able to handle much larger mode's. 

M oreover, the new method is much simpler to implement; our entire pro- 
totype is only about 6000 lines of J ava code, of which almost half is con- 
cerned with front-end functionality (parsing and static checks). 


The explicit method is barely able to handle the M obile|P example, and 
cannot handle Allocate at all. Allocate is particularly well suited to the new 
method, because it involves several relations but few compositions. Finder, 
on the other hand, is biased towards the explicit checker; its relations are 
constrained to be functions, and transitive closure (which isno more expen- 
sive for the explicit checker than any other operator) appears several times in 
the formula. It is nevertheless surprising that W alkSAT takes so long to 
solve the scope 6 instance of Finder - it required about 760 thousand flips 
of boolean variables, distributed over 8 separate attempts from different, 
randomly generated starting assignments. 


T ables 2 and 3 show the effects of negation caching and symmetry break- 
ing. In both, the italicized columns represent measurements taken without 
the optimization. N egation caching has a huge effect, increasing with the 
scope; in many cases, it reduces the final formula size by an order of mag- 
nitude. T he scope 4 instance of Style is worth noting: the translation time 
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but not the formula size is dramatically reduced. T his is probably because 
negation caching saves unnecessary computation, and reduces the size of 
many formulae generated in the course of the translation. 


Symmetry has a negligible effect on all examples except for Style the only 
example whose formula involves negated equalities of relation-valued vari- 
ables. The italicized numbers in this table were obtained with symmetry 
breaking off, but negation caching on.W ithout symmetry, the formula for a 
scope of 5 cannot even be generated. 
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Example | Cases Formulae Scope Vars Clauses | Translate Solve Explicit 

Phone | ] 10 3 66 307 Os Os (DP) Os 

4 112 842 Os Os (DP) 0.5s 

5 170 2159 Os Os (DP) 35s 

6 240 5413 1s 2s (DP) 5m 
Finder | ] 47 3 273 1364 Os 1s (DP) 0.5s 

4 464 3500 Os Os 1s 

5 705 8483 Os 35 11s 

6 996 20779 9s 2m,33s 2m,12s 
Style | 12 71 3 408 1864 1s 6s (DP) 1s 

4 704 4385 2s Os 11s 

5 1080 9977 6s 5s 11m,45s 
Allocate | ] 41 3 192 522 Os 1s (DP) 10s 

4 316 931 Os Os ? 

5 470 1473 Os Os 2? 

10 1690 6693 6s Os 2? 
Mobile IP | ] 68 3 438 2436 Os Os 2? 

4 760 6548 35 Os ? 

5 1170 16536 8s Os 2? 


Table 1: Performance compared to explicit method 
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Example Scope Vars Clauses | Clauses Trans Trans Solve Solve 
Phone 3 66 60] 313 Is Os Os (DP) Os (DP) 
4 112 2808 856 Os Os Os (DP) | Os (DP) 
5 170 12489 2189 31s Os 1s (DP) | Os (DP) 
6 240 52923 5413 6m,12s | 1s 55 (DP) | 2s (DP) 
Finder 3 273 1793 1370 Os Os Os 1s (DP) 
4 464 6935 3500+ 2s Os Os Os 
5 705 27831 8483+ 275 Os 5s 35 
6 996 112635 | 20866 9m,325 | 9S ? 2m,33s 
Style 3 408 2908 2234 Os Os 65 (DP) | 6s (DP) 
4 704 57377 53495 Im,51s | 17s* Os Os 
5 1080 ?? 2? ?? 2? ?? 2? 
Mobile IP 3 438 3330 2439 Is Os Os Os 
4 760 13242 6608 178 35 Os Os 
5 1170 52826 16691 2m,565 | 95S Os Os 


Table 2: Effect of negation caching 
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Example Scope Vars Clauses | Clauses Trans Trans Solve Solve 

| Style 3 408 2234 1864 Os Os 25 (DP) 6s (DP) 
4 704 53495 4385 175 2s Os Os 
5 1080 ?? 9977 ?? 6s ?? 5s 


Table 3: Effect of symmetry breaking 


18 Related Work 


Our previous method worked by explicit enumeration of relation values, 
with two principal mechanisms to prune the search: short circuiting, in 
which a partial assignment could be rejected by determining that any exten- 
sion to a full assignment would not yield amodel [DJ 96], and isomorph 
dimination, which exploited symmetries in the search space to avoid the 
generation of ahigh proportion of the relation values [JDJ 96, J)D 97]. 


On most of our examples, the new boolean method performs better. In 
particular, the explicit checker does badly when there are several variables 
representing relations that are not constrained to be functions, and for 
which there are few mutual constraints; in this case, short circuiting fares 
badly, and the search is often intractable The explicit checker always prefers 
functions to relations; the boolean checker, in contrast, prefers relations 
since there are then no side conditions. On the other hand, increasing the 
complexity of the formula itself tends to improve the performance of the 
explicit checker, but it has a detrimental effect on the boolean checker. The 


explicit checker can take advantage of equalities by eliminating variables; the 


boolean checker, in contrast, introduces variables to avoid complex expres- 
sions and reduce translation time and pays the price in solving time For 
this reason, the explicit checker might be better as a simulator: if operations 
are expressed mostly constructively (with post-state variables equated to ex- 
pressions involving pre-state variables), it can explore the execution of ase 
quence of operations with little extra cost. 


In his PhD thesis, Craig Damon is investigating bounded gneration, anew 
pruning mechanism for the explicit checker that seems, from initial experi- 
ments, to be promising. It remains to be seen how it will perform in com- 

parison to the boolean checker. 


We have experimented before with a boolean checker, representing boolean 
formulae not in CNF but with ordered binary decision diagrams (BD D s) 
[DJ]96]. Although that method performed very well on some small exam- 
ples, it did not appear to scale. Unlike CN F, aBDD is canonical, so trans- 
lation cannot be separated from solving; if the formula has no models, it 
must be the formula false Translation into BD Ds can take vast amounts of 
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memory, and offers no discount for finding only some mode's, since the 
final BD D represents all models. The BD D -based checker also suffered 
from the well known unpredictability of BD Ds; asmall changein the vari- 
able ordering might have a dramatic effect, and problems that looked alike 
were often not equally hard to solve. The canonicity of BD D sis essential in 
the context of mode! checking, because it allows detection of fixed points. 
But for our problem, canonicity is unnecessary and its cost is not warranted. 


T he remarkable success of the stochastic solver GSAT [SLM 92] and its de 
scendants has made boolean translation attractive in other domains too. 

T he closest application to oursisin planning, where the problem of finding 
a plan that satisfies a set of constraints is reduced to finding a mode of a 
boolean formula [K S96, EM W 97]. The planning problem is technically 
closer to the model checking problem than to our problem: namely finding 
a sequence of transitions in a state machine that leads to a state satisfying a 
given property (in planning, the goal, and in mode! checking, the negation 
of the invariant). The focus on data structures rather than on transition se 
quences makes our compilation process rather different from those em- 
ployed in planning. 


19 Future Work 


19.1 Algorithms and Data Structures 

The trie representation of CN F is reasonably compact - typically a third 
smaller than a representation without sharing of subclauses - and, because 
of the ordering of variables, efficiently manipulated. T he trie structure 
eliminates some redundancy, but not all: a clause may appear with a sub- 
clause if the subclause is not a prefix. We performed some experiments to 
determine what proportion of clauses held in the trie are redundant; it ap- 
pears to be around 30% for large tries. It might be possible to eliminate 
more redundancy without too much additional computation, perhaps by 
using a different data structure such as a suffix tree. 


19.2 Language Extensions 


Several features would add nothing in expressive power but would make N P 
an easier language to use. 


Quantifies over xalar variables would relieve the burden of encoding all 
constraints with relational operators alone. O ne common idiom that calls 
for quantifiers (or perhaps a dual of the composition operator) arises from 
an assertion that would be written with quantifiers thus: 


forall x, y, z.  -> y) in p and (y -> z)ing => (x -> z)inr 
To encode this in relational operators, we rewrite with negations 


not exist x, y, Z. (x -> y) in p and (y -> z) ing and not  -> z)inr 


and then translate negation using complementation: 


p;qg&Un\n= 
T he purely relational version is often - as here - terse, elegant and obscure 
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An abstraction mechanism for generic functions and constraints would simplify 
specifications by factoring out complex idioms. T ree structure, for example, 
arises frequently; rather than axiomatizing each tree structure anew, the 
specifier might write T ree(p) to assert that a relation p mapping nodes to 
their parents describes a tree, with the predicate T ree previously defined: 


Tree (p) = (Fun(p) and Acyclic(p) and One(Roots(p))) 
Acylic (p) = (p+ & Id = {3) 

Roots (p) = (ran p \ dom p) 

One (s) = (Fun(Un :> s)) 

Fun (p) = (p~ ; p cld) 


Ternary rdations are not common, but are clumsy to encode as binary rela- 
tions and should be supported directly. A ternary relation ron A x B x C is 
probably best treated curried, so that r.a denotes a relation on B x C. 


Integers would add expressive power. Subtraction, addition and comparison 
alone would allow a sequence of elements of type T to be modelled more 
naturally as a function from a prefix of the naturals to T, where currently we 
are forced to represent the sequence more abstractly as an ordering of buck- 
ets that contain values. Indexing of sequences would simply be an instance 
of function application; concatenation would be provided explicitly. H ow 
to handle the semantics of integer addition over a finite scope is not clear. 
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Appendix: Benchmark Specifications 


Phone 
[Ph, Num] 
Switch = [ 


Called: Ph <-> Num 
const Net: Num -> Ph 
Conns: Ph <-> Ph 


Conns = Called ; Net 
] 


Join (p: Ph; n: Num) = [ 
Switch 


p in dom (Called) 

not (n in ran Called) 

Called’ = Called U {p -> n} 
] 


invB = [Switch | (dom Conns) & (ran Conns) = {3] 
invC = [Switch | fun (Conns~)] 


InvB_preserved (p: Ph; n: Num) _ :: [Switch | Join(p,n) and invB => invB’] 
InvC_preserved (p: Ph; n: Num) :: [Switch | Goin(p,n) and invC) =>invC’] 


Finder 
[OBJ] 


Finder = [ 
const drive, trash: OBJ 
const files, folders: set OBJ 
dir, links: OBJ -> OBJ 
trashed, aliases: set OBJ 
| 
{drive, trash} <= folders \ dom dir 
ran dir <= folders 
trashed = dir~+.{trash} 
not drive in trashed U {trash} 
aliases <= files 
aliases = dom links 
links+ & Id = {} 
files & folders = {} 
files U folders = OB) 
] 


Move (x, to: OBJ) = [ 
Finder 


X not in dir*.{to} 


39 


dir’ = dir (4+) {x -> ((inks* ;> aliases).to)} 
links’ = links 


] 


TrashingWorks (x, to: OBJ) :: [Finder | 
Move (x, to) 
and to in trashed U {trash} 
=> X in trashed’ 


Style 
[style, format] 


Tree =[ 
based : style -> style 
const normal : set style 
| 
normal & dom based = {} 
ran based \ normal <= dom based 
based+ & Id = {} 
] 


Sheet = [ 
Tree 
delta, assoc : style -> format 
| 
normal <: assoc 
normal <; assoc = normal <; ((based ; assoc) (+) delta) 


] 


normal <: delta 


ChgParent (s, from, to : style) = [ 
Sheet 
| 


s in dom based and from = based.s 
based' = based (+) {s -> to} 

assoc’ = assoc 

{s} <; delta = {s} <; delta’ 

assoc.to = assoc.from => delta’ = delta 


] 
Xi 0= [const Sheet] 


Claim (s, from, to : style) :: 
[Style | ChgParent (s, from, to) ; ChgParent (s, to, from) => XiQ] 


Allocate 


[USER, RESOURCE] 


Bookings = [ 
reservedBy : RESOURCE -> USER 
pending, granted, free : set RESOURCE 
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reserved : set RESOURCE 
| 

reserved = dom reservedBy 

pending & granted & free = {} 

pending & free = {} 

granted & free = {} 

pending U granted U free = RESOURCE 
] 


Resources = [ 
open, closed : set RESOURCE 
const overlap, incons, excludes : RESOURCE <-> RESOURCE 
excluded : set RESOURCE 

| 
overlap~ = overlap and overlap & Id = {} 
incons~ = incons and incons & Id = {} 
excludes = incons U overlap 
incons.open <= closed 
excluded = excludes.open 
open & closed = {} 
open U closed = RESOURCE 


Users = [ 
usedBy : RESOURCE -> USER 
used : set RESOURCE 

| 
used = dom usedBy 


] 


Allocate (r : RESOURCE; u : USER) = [ 
Resources Users Bookings 

| 
rin (pending & open) \ (excludes.reserved) 
{r -> u} <= reservedBy 
granted’ = granted U {r} 
reservedBy’ = reservedBy 
usedBy' = usedBy U {r -> u} 


AllocSafeO (r : RESOURCE; u : USER) :: [ 

Resources Users Bookings 
| 

Allocate (r, u) and (excludes.reserved & reserved = {}) 
=> excludes.reserved' & reserved' = {} 


] 


AllocSafel (r : RESOURCE; u : USER) :: [ 
Resources Users Bookings 


Al 


Allocate (r, u) and (excluded & granted= {}) 
=>excluded' & granted’ = {} 


] 


AllocSafe2 (r : RESOURCE; u : USER) :: [ 
Resources Users Bookings 
| 
Allocate (r, u) and (incons.used & used = {}) 
=> (incons.used'’ & used’ = {}) 


] 


AllocSafe3 (r : RESOURCE; u : USER) :: [ 

Resources Users Bookings 
| 

Allocate (r, u) and (incons U overlap).used & used = {} 
=> (incons U overlap).used' & used’ = {} 


] 


Mobile IP 
[HOST, MSG, TS] 


net =[ 
router: HOST 
cached: set HOST 
subh: set HOST 
clock: TS 
caches: HOST -> HOST 
cache_exp_time: HOST -> TS 
updates: set MSG 
to, from, where: MSG -> HOST 
send_time, exp_time: MSG -> TS 
const precedes: TS -> TS 
const before: TS <-> TS 


dom to = updates 

dom from = updates 

dom where = updates 

dom send_time = updates 

dom exp_time = updates 

dom cache_exp_time = dom caches 

exp_time <= send_time; precedes+ 

caches & Id = {} 

(from; from~) & (to; to~) & (send_time; send_time~) <= Id 
(from~; to) & Id = {} 


/* axiomatize the time ordering as a total order */ 
/* for now, make do with weaker constraint */ 
before = precedes+ 

before & Id = {} 
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mh_arrive (h:HOST; m:MSG; t:TS) = [net | 
not router = h 
router’ = h 
not m in updates 
tin (before.{clock}) 
clock’ in (before.{clock}) 
subh <= cached 
cached’ = subh 
cache_exp_time’ = (cached' <: cache_exp_time) :> (before.{clock’}) 
caches’ = dom(cache_exp_time’) <: caches 
updates’ = updates U {m} 
send_time’ = send_time U {m -> clock} 
exp_time’ = exp_time U {m -> t} 
to' = to U {m -> router} 
from' = from U {m -> h} 
where' = where U {m -> h} 


update_arrival (m:MSG) = [net | 

clock’ in before.{clock} 

subh <= cached 

cached’ = {to.m} U subh 

cache_exp_time’ = (cached' <: (cache_exp_time (+) {to.m -> exp_time.m})) :> 
(before.{clock'}) 

caches’ = dom (cache_exp_time’) <: (caches (+) {to.m -> where.m}) 

router’ = router 

updates’ = updates 

to’ = to 

from’ = from 

where’ = where 

send_time’ = send_time 

exp_time’ = exp_time 


acyclic_caches = [net | caches+ & Id = {}] 


host_move_OK (h:HOST; m:MSG; t:TS) :: 
[net | acyclic_caches and mh_arrive (h, m, t) => acyclic_caches’] 


loc_update_OK (m:MSG) :: 
[net | acyclic_caches and update_arrival (m) => acyclic_caches’] 
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